System Performance and Capacity Planning Using Sas Software

نویسنده

  • Thomas G. Confrey
چکیده

This paper discusses a system for capturing and analyzing data for large IBM 3090 type. mainframes. The data is retrieved from records produced by IBM's Resource Monitoring Facility (RMF). IBM's DF .. 'lMS Data Collection Facilities (JJCOLLECT), and Landmark's CICS Monitor. SAS programs are used to read, summarize, and condense this information and store it in a database. Once slored. the information can be easily retrieved and displayed in various ways (numerically and graphically) to aid in performance management and capacity planning. Some of the performance indicators discussed are: CPU utilization, TSO response time, transaction rates, paging rales, information rdated to active data storage, and CICS response time. Also discussed are various ways 10 capwre and view CPU utilization for computer complexes that run ill multi-image mode (e.g. PRISM). The complete system is available on a public domain lope. BACKGROUND We rust decided to use SAS* software to process system data in an effort to simplify analysis of certain key performance indicators. The analysis data of choice was thal of IBM's Resource Monitoring Facifity* (RMF), which is a measurement collection tool that is used to monitor system activity. The system initially handled such things as CPU utilization, paging, and workload activity. because they could be readily calculated from RMF records. The scope of the system was expanded over time. First with more information from the RMF 70-79 records, and then with information from CICS monitors. The most recent addition is data from 10M's Data Collection Facilities* (DCOLLECT) utility. The methodology outlined here is one that, basically, extracts information from records produced from various sources and stores that information in historical SAS data sets in both raw and summarized form. The data is maintained and processed by a collection of user written SAS programs. An overview of the system and examples of how the data can be used wiU be discussed. The complete system, inc1uding source programs and directions, is currently contained on the public domain eDT IMVS Mods tape (11. The programs will handle data from MVS/ESA * (release 3.1.3) and CICS· 1.7 and 2.1 systems. (With a few minor adjustments MVSjXA'" data can be processed.) C",omments and sugge.c;tions are welcome. RM!' DATA RMF runs as a started task and it has both on-line and batch capabilities (2). The data collected by RMF is written to the System Management !'acility (SM!') data sets [3J and is used by RMF's post processor to produce summaries and reports. Another way to process these records is with SAS programs. This makes it easy to do statistical analysis, produce graphs, forecasts and trend lines which should facilitate the analysis of system performance and help in capacity planning. At the start of each week SMF tapes from the previous week are read and all RMF records (type 70-79) are extracted onto a weekly disk file. The records are then sorted so that they can be use by the RMF post processor. (£ltc post processor requires that the records be in order by interva start date and interval start time.) The sa~e procedure is followed at the end of the month for the month's RMF records. [t is at this point the system uses the sorted raw RMF records. Several SAS programs arc run against this file selecting, summarizing, and manipulating certain information and then sLoring it in a SAS data set. (SAS data sets are direct access data sets that are similar to partitioned data sets but managed by SAS sofiware.) There is a record for each RMF interval recorded. Records are grouped by record type and time period and stored as a "'member'" of the SAS data sel. (Refer to Figure 1.) 447 ['or example, the first week of March there would be one member with information from all the '70' records, another member with records with data from all the '71' records, and lastly, a member with records from all the '72' records. If RM I' was set up to collect data for one hour intervals then there would be 24 records per day for each record type. For a 7 day period this would be approximately 168 records per member. (This number could be more or less Ulan 168, depending on such factors as how many times RMF was stopped and then sLarted, whether or not the operating syst.em came down or was IPL'ed.) The data is further segregated by system. The information is stored by month and also by week for the current month. For example, if we had 2 systems and we started recording in January then on April 14th there would be the following collection of members : 24 monthly members ([3 record types] x [2 systems] x [4 monthsl = 24) and 12 weekly members ( [3 record typesJ x [2 systems) x [2 weeks I = 12). We will refer to this stored information in the SAS data set hereafter as the database. (Figure 2.) LOGICAL AND PHYSICAL PARTITIONS Since there are machines that can be configured with physical partitions (PP) and logical partitions (LPAR) with each partitjon executing its own copy of an operating system, it became necessary to track CPU utilization not only by operating system, but by physical machine. (A physical machine is also refered to as a "'physical proce.c;sor complex"', or PCP. I'or example, one 3090-600S, no matter how it is configured.) New SAS programs were written which allowed us to keep track of total CPU utilization for a PCP without having to worry about whether it was physically or logically partitioned or how many logical CPUs were in usc. This is accomplished with user written SAS program (CPUT) which summerizes RMF data from each operating system and stores it in the database under the name '5sCPyymm'. Where 58 = some sort of system identifier, yy = last 2 digiL'i of the year, and mm = month. An examples would be: SlCP9004 for the month of April 1991 for system SI. This data is then read by program CPTOT which combines the data to show utilization for the whole machine. Input for CPTOT is all the RMF 70 records for those systems that ran on that phycical machine. Since our systems run under lhe control of PRISM'" (i.e. LPAR) they will have their utilization contained in each one of the different system's RMF 72 records. Therefore, the program must first determine the operational mode of the machine. Vor LPA R mode. only records from one operating system are used, unless that system was down lor some time interval, in which' case the inrormation for that interval is retrieved from the other operating system's records. For physical partition mode the program combines the CPU data. For single image mode. it wiIJ only read one set of records. The program creates SAS member UTMIyymm. (E.g. UTM19103 = March 1991 util. for machine #1.) CPU UTILIZATION There are various ways to display CPU utilization. One is to graph the average CPU utilization and then enhanced it by adding a regression line. By eXI.ending this line one can get a feel for what future utilization might be based on past utilization. We report CPU utilization by operating system and also by PCP. Average utilization by itself. however. can sometimes be misleading. A system might have an average utilization of 600/0 for the day. but have peaks of 90% utilization. Therefore. a method is necessary to track peak utilizations. Figure 3. is similar to the average utilization graph that was previously mentioned. except that its charts the maximum CPU utilization per day. A regression line is also plotted. Another perspective of utilization is a graph that shows the minimum, average, and maximum utilization by hour for the week. (Figure 4.) A variation of this graph is one which shows the average utili7.ation for each hour for each day, all contained on a single graph. The previous graphs are aIL useful, but they lack the ability to give a visual indication of how often a machine reaches a certain utilization level or how much it varies around the average (e.g. standard deviation). One solution to this problem is a bar graph that presents total CPU utiHzalion for a processor complex in one continuous form. (Figure 5.) The y-axis is CPU utilization and the x-axis is each RMF interval (normally 1/2 hour) from 9 a.m. to 4 p.m. Monday through Friday. From this one can get a betler understanding of total machine utilization. This graph can be supplemented with a printed report that lists the average CPU utilization by day as well as the maximum machine utilization. An improvement of this graph, is the same graph except that each bar is subdivided by the amount of utilization that each partition contributed to the overall total utilization. Still another way to view this data is to rearrange the data and present it in a 3-dimensional graph for a longer time period. say 6 months. (Figures 6.) To make it easier to interpret, different symbols are used to denote different CPU utilization ranges. Since a 3-dimeno;ional graph can have certain points obscured, depending on the data, we normally present the graph from three different reference points_ (Only one is shown.) Not only does this graph point out how often peak utilization is reached, but also what time of day certain utillzations are concentrated. Interpretation is made easier by varying the range of the symbols and by producing the graph in rolor. Since regression lines are based on the past they cannot account for new work that is entered into the system. For this one must review users' plans for projected work. One should also examine. from time to time, the statistical calculations that can be produced with these graphs to see how good a fit the regression line is. It is also important to note that for intallations using PRjSM, RMF includes the overhead of the LPAR dispatcher in its utilization figures. This, plus the fact of the low utilization effect of PRISM (when there is tess work to dispatch there is more overhead) makes it difficult to do capacity planning. There are rumors that IBM will break out the overhead in future RMF versions. but until that time care has to be taken when interpreting utilization numbers. 448 PAGING One useful way to look at paging is to plot the average pages per second for each day over a time span of six months. lbe graph can be enhanced by adding the maximum paging rate for the day. The average line is good for identifying trends taking place that are increasing (or decreasing) memory requirements. If abnormally high peaks occur along the maximum line they should be scrutinized to find the cause. Once again a 3-D graph (pages/second by date by hour) could be used if one wanted to probe deeper into paging activity. The advent of Expanded Storage (ES) has made it necessary t,o track paging information as it relates to this resource. ES is optional processor storage which MVS will use. among other things, as a place to put unused pages. Over time these pages may be migrated to auxiliary storage (DASD) if the amount of ES becomes overcommitted. It takes approximately 70 microseconds to retrieve a page from ES and it takes some number of milliseconds to retrieve a page from auxilary storage (depending on the device type and device utilization). This, aod the fact that a page must be brought back into main storage before it can be transfered to auxiliary storage makes it necessary to track additional indicators. One of these is the average migration age. (fhe average number of seconds a page is in ES without a reference before it is migrated to auxiliary storage.) In many systems this number is very large and hard to quantify. But, obviously, the lower the number the more ES is being used. Another indicator is the migration rate. (fhe number of pages per second that are migrating from ES to page data sels on auxiliary storage.) figure 7 is a graph that plots the average migration rate and also the maximum rate for the day. foor those among us who track total paging rate, remember that it is slightly different with ES. The RMF reports make a distinction between pages to DASD and pages to and from ES. TSO Many installations have exits that automatically cancel users who have been inactive for a specified period of time. In these cases the average number of TSO users logged on is a good indicator of how many people are actively using that resource. The number of TSO users logged on during a RMF interval is recorded in three catagories. the average number logged on, the minimum, and the maximum. A graph of this data is a good reference. figure 8 summarizes the average response time for TSO. It is broken down by performance group period as welt as the average for all the periods. Period 1 usually generates considerable interest since it contains all transactions that are normally classified as trivial transactions. (fhey use the least amount of service units.) DASD Certain releases of MVS/DFP* now have the ability to produce n'!Cords related to dala set storage, storage media, and storage disposition. This function is referred to as the DfSMS Data Collection facility (DCOLLECT) and is a new function in Access Method Services [4}. Simply stated, DCO LLECT will lake a snapshot of all DASD that is on-line at the time that the ulility is run. The number of records written depends on such things as the type of informaljon requested, the number of volumes on-tine. the number of data selo; on the volumes, etc. lbe different record types that can be produced are: Active Data Set (type D), Volume Information (V), VSAM Association (A), Migration Data Set (M), Data Set Backup Version (8), DASD Capacity Planning (C). Reports with different degrees of detail can be produced. Information is provided for system-managed environments (SMS). non-SMS environments, or environments that are combinations of both. We have just begun using this utility and at press time we have written SAS programs to handle the type 0 and V records. 0N e are slowly working our way down the list of record types.) Figure 9 is a graphical representation of OASD capacity and use. The dotted line on top represents the amount of DASD that is currently installed. The increase in this line is where an increase in installed DASD occurred. The 'Free' line denotes the total amount of space that is not allocated. The line tabled 'Allocated' reports the amount of space that has been allocated (which includes space that is al10caled but not used). The 'Capacity' line is the sum of allocated and unused space and is the total amount of DASD space. This number can be less than the total amount of installed DASD due to the fact that some volumes may have been off-line at the time that the records were created. Use of the active data set records (type D) allows one to explore at the data set level. Figure 11 summarizes the top 10 owners of OASD space by department. Each bar consists of the total amount of allocated space that a department (or project) has. The bar is further subdivided by used space and free space. In this case, free space is that part of the allocated space that is not being used. One thing missing from our discussion on DASD is response time. We do report on it, but we don't use our own SAS programs. In the future we intend to write code to report on such things as average response time (broken down by the various response time components) for the volumes with the highest response times and cache hit ratios. CICS A separate database was created to hold CICS information. That information originates from records produced by Landmark's The Monitor for CICS· VS.I (rMON), which produces records similar to IBM's eMF"'. Originally, all of the data was stored in one database. For convenience, however, the information is now stored in separate databases, one for each platform. Each record summarizes one hour's worth of data. We attempted to look ahead to see how the data would be best used. As a result, in addition to storing the total number of transactions and the average response time, we also break down the the number of transactions that complete in a certain time period (e.g. 1 second, 3 seconds 5 seconds, etc.), CPU time, wait time, UL/I counts, etc. Reviewing average response time and transaction volume is fine, but a more meaningful interpretation should also take into account transaction mix and the percent of certain transaction groups to complete in a specified time period. This is information I1lat might be pertinent to service level agreements. The two CICS graphs exhibited here are rather straightfoward. The first (Figure 11) is a variation of the standard transaction volume over lime. One line represents the total number of user transactions on on the prime shift for one CICS platform and the other line represenls the peak tranactions for one hour. Figure 12 is a graph created ad hoc and has two vertical axes. One shows the percent of transactions less than I second and the other is the total amount of transactions. In both of tlJese figures the transaction represent user transactions. 449 SUMMARY Space limitations have pre,:,ented us from going into more detail and showing more graphs. Be that as it may, what has been decribed here can be reduced to two parts. Building the database and then using the database to aid in analysis and reporting. One of the fundamental features is that many other graphs can be produced with little effort once the data has been stored in the database. To zero in on a particular situation, one would read the appropriate member(s) from the SAS database for the time period desired, then add statements to new or existing code to exclude the undesired times and dates. Graphs can easily he created for special situations. (e.g. Figure 12) One of the items we produce is a weekly report. It is in the form of a packet with a table of contents. printed reports, and a series of graphs, some of which have been shown here. Everything is done with SAS. This includes the cover and the table of contents which is done using the GSLIDE PROC. We also have an on-line version of the report created by storing the graphs in a graphics catalog. To access these graphs all one has to do is enter a CLIST name from TSO. This will display a menu of the graphs available. (This is accomplished with the GREPLA Y PROC.) The user then select,; the desired graphs from the menu. Author contact: Thomas G. Confrey (203) 566-4362 State Of Connecticut Data Center 340 Capitol Ave., 4th Floor Hartford, cr 06016 ACKNOWLEDGMENTS We would like to thank our colleagues for their support, especially, Walt Silva for extracting TMON data and for his help in navigating through the panels of SAS version 6. Also, to Bill Jurgens for help in using and graphing DCOLLECf data. We are particularly indebted to Ellie Rosenbaum for her programming help, suggestions, and advice. REFERENCES AND NOTES. I. The CBTjMVS Mods Tape was formally handled by Mr. Arnold Casinghino of CRT. It can now be obtained rrom: 2. National Systems Programmers Association, Inc. (NaSPA) (414)-423-2420 or Mr. Fred Robinson, (305P84-6257 SHARE Program Library Agency, University of Miami 146 Memorial Drive, Coral Gables, FL 33124 or You may contact the authors. IBM MVSjESA Resource Measurement Facility (RMF) Monitor I and II Reference and User's Guide LY28-1007 IBM MVSjESA Resource Measurement l'acility (RMF) Monitor III Reference and User's Guide LY28-1008 3. InM MVS/ESA System Programming Library: System Management Facilities (SMF) GC2S-1819 4. DFSMS Planning And Reporting With DCOLLECr, IBM GG24-3540 • SAS is a registered trademark of SAS Institute, Cary, NC, USA * Landmark, The Monitor for CICS are trademark of Landmark Systems Corp. • IBM is a registered trademark. MVSjESA, CICS, MVSjXA, PRjSM, RMF, MVSjDFP, 3090, SMS, SMF, DCOLLECT and CMF are trademarks International Business Machines Corp.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CAPACITY PLANNING USING CAPTUREjMVSTM, BEST/ITM AND SAS

Tn this paper~ we present a method for capacity planning which illustrates the mutually complementary role~ of CAPTURE/MVS~ BEST/I and SAS. The first two of these are software products developed by BGS Systems, Inc.; they provide the ability to extract and analyze MVS measurement data, and to project the performance impacts of system or workload changes. In this study, SAS 1::; used in cunjunct...

متن کامل

Performance and Capacity Planning Considerations for the SAS System on

The objectives of this tutorial are to acquaint you with performance-sensitive SAS system options in the MVS and VM/SP environments and with other external tuning techniques. Your choice of option settings and tuning decisions can substantially affect the resource requirements required to run SAS programs. We will also go over a basic methodology for estimating resource requirements of SAS appl...

متن کامل

Getting from SAS 9.1.3 to SAS 9.2: Migration Tools or Promotion Tools

If you are running a metadata server in your SAS ® 9.1.3 environment, you must upgrade your metadata when you move to SAS ® 9.2. There are two possible approaches to take. The first approach is a migration, which will essentially copy your SAS 9.1.3 environment over to SAS 9.2 as part of your SAS 9.2 installation process. The second approach is to make a fresh start by installing SAS 9.2 first,...

متن کامل

AUTOMATED WORKLOAD FORECASTING - A LARGE COMPLEX SASe APPLICATION

In the area of computer performance and capacity planning, large amounts of operational data often have to be analyzed. Data analysis using SAS has been documented many times over the years. However, the most critical area of capacity planning-forecasting resource requirements--has received minimal attention as published papers or even as an area of development by most installations. Forecastin...

متن کامل

HKIA SAS: A Constraint-Based Airport Stand Allocation System Developed with Software Components

SAS is an AI application developed for the Hong Kong International Airport (HKIA) at Chek Lap Kok. SAS uses constraint-programming techniques to assign parking stands to aircraft and schedules tow movements based on a set of business and operational constraints. The system provides planning, real-time operation, and problem solving capabilities. SAS generates a stand allocation plan that finely...

متن کامل

Using SAS Software to Analyze Sybase Performance on the Web

This paper provides a web-based system using SAS, HTML and CGI/PERL to provide rudimentary and complex Sybase DBMS performance metrics for Unix based system operations. Sybase SQL Server performance data is collected by Sybase Historical Server allowing for the collection of performance information with minimal impact on the server. The SAS System (Base SAS, Macro, STAT and SAS/Graph) is especi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010